-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug #6176
Conversation
Merged build triggered. |
Merged build started. |
@JoshRosen does it make sense to re-enable those flakey tests in this patch? |
private[this] var closed: Boolean = false | ||
|
||
override def write(b: Int): Unit = { | ||
if (closed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these guarding against? I understand you'd need it in close, but here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the unlikely event that we wrote to a closed stream, this would end up mutating a buffer that might be shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the upstream fix also incorporates this: xerial/snappy-java#108 (comment)
Merged build finished. Test FAILed. |
Test FAILed. |
Jenkins, retest this please. |
@pwendell, my hotfix didn't disable the tests; instead, it added a workaround in another test suite that clears the pool of reusable Snappy buffers, fixing the JavaAPISuite failures (I tested this locally before pushing the hotfix). |
Merged build triggered. |
Merged build started. |
Test build #32793 has started for PR 6176 at commit |
Test build #32793 has finished for PR 6176 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
@JoshRosen ah totally understand - got it. |
*/ | ||
private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends OutputStream { | ||
|
||
private[this] var closed: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking - this doesn't need to be volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. I guess this needs to be volatile in case we're performing cleanup in another thread. @rxin, if this is volatile, won't that make the write()
checks way more expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already do per-record checking of volatile vars.
On Fri, May 15, 2015 at 4:40 PM, Josh Rosen notifications@github.com
wrote:
In core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
#6176 (comment):}
override def compressedInputStream(s: InputStream): InputStream = new SnappyInputStream(s)
}
+
+/**
- * Wrapper over [[SnappyOutputStream]] which guards against write-after-close and double-close
- * issues. See SPARK-7660 for more details. This wrapping can be removed if we upgrade to a version
- * of snappy-java that contains the fix for SnappyOutputStream.close() is not idempotent xerial/snappy-java#107.
- */
+private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends OutputStream {
+- private[this] var closed: Boolean = false
Ah, good point. I guess this needs to be volatile in case we're performing
cleanup in another thread. @rxin https://github.com/rxin, if this is
volatile, won't that make the write() checks way more expensive?—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/6176/files#r30454371.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline: as far as we know, this is only ever executed in a single-threaded context (since tasks are single-threaded and their stop / cleanup logic is performed in that same thread), so the volatile should not be necessary.
Josh this LGTM - if there are no outstanding comments we should try to get this in since it's one of the few remaining blockers. |
LGTM. |
I'm going to merge this into master and branch-1.4, then will cherry-pick back to 1.3.x and 1.2.x. Thanks for the reviews. |
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
This patch wraps
SnappyOutputStream
to ensure thatclose()
is idempotent and to guard against write-after-close()
bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotentclose()
method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.